Automatic Metrics for Genre-specific Text Quality
نویسنده
چکیده
To date, researchers have proposed different ways to compute the readability and coherence of a text using a variety of lexical, syntax, entity and discourse properties. But these metrics have not been defined with special relevance to any particular genre but rather proposed as general indicators of writing quality. In this thesis, we propose and evaluate novel text quality metrics that utilize the unique properties of different genres. We focus on three genres: academic publications, news articles about science, and machine generated text, in particular the output from automatic text summarization systems.
منابع مشابه
Different Flavors of GUM: Evaluating Genre and Sentence Type Effects on Multilayer Corpus Annotation Quality
Genre and domain are well known covariates of both manual and automatic annotation quality. Comparatively less is known about the effect of sentence types, such as imperatives, questions or fragments, and how they interact with text type effects. Using mixed effects models, we evaluate the relative influence of genre and sentence types on automatic and manual annotation quality for three relate...
متن کاملVariation of Word Frequencies across Genre Classification Tasks
This paper examines automated genre classification of text documents and its role in enabling the effective management of digital documents by digital libraries and other repositories. Genre classification, which narrows down the possible structure of a document, is a valuable step in realising the general automatic extraction of semantic metadata essential to the efficient management and use o...
متن کاملThe Correlation of Machine Translation Evaluation Metrics with Human Judgement on Persian Language
Machine Translation Evaluation Metrics (MTEMs) are the central core of Machine Translation (MT) engines as they are developed based on frequent evaluation. Although MTEMs are widespread today, their validity and quality for many languages is still under question. The aim of this research study was to examine the validity and assess the quality of MTEMs from Lexical Similarity set on machine tra...
متن کاملCross-Domain Dutch Coreference Resolution
This article explores the portability of a coreference resolver across a variety of eight text genres. Besides newspaper text, we also include administrative texts, autocues, texts used for external communication, instructive texts, wikipedia texts, medical texts and unedited new media texts. Three sets of experiments were conducted. First, we investigated each text genre individually, and stud...
متن کاملAutomated captioning of television programs: development and analysis of a soundtrack corpus
The purpose of this research is to investigate methods for applying speech recognition techniques to improve the productivity of off-line captioning for television. We posit that existing corpora for training continuous speech recognisers are unrepresentative of the acoustic conditions of television soundtracks. To evaluate the use of application specific models to this task we have developed a...
متن کامل